Reinforcement Learning for Games: Failures and Successes CMA-ES and TDL in comparision

نویسندگان

Wolfgang Konen

Thomas Bartz-Beielstein

چکیده

We apply CMA-ES, an evolution strategy with covariance matrix adaptation, and TDL (Temporal Difference Learning) to reinforcement learning tasks. In both cases these algorithms seek to optimize a neural network which provides the policy for playing a simple game (TicTacToe). Our contribution is to study the effect of varying learning conditions on learning speed and quality. Certain initial failures with wrong fitness functions lead to the development of new fitness functions, which allow fast learning. These new fitness functions in combination with CMA-ES reduce the number of required games needed for training to the same order of magnitude as TDL. The selection of suitable features is also of critical importance for the learning success. It could be shown that using the raw board position as an input feature is not very effective – and it is orders of magnitudes slower than different feature sets which exploit the symmetry of the game. We develop a measure “feature set utility”, FU , which allows to characterize a given feature set in advance. We show that the lower bound provided by FU is largely in accordance with the results from our repeated experiments for very different learning algorithms, CMA-ES and TDL.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Covariance Matrix Adaptation Evolution Strategy for Direct Policy Search in Reproducing Kernel Hilbert Space

The covariance matrix adaptation evolution strategy (CMA-ES) is an efficient derivativefree optimization algorithm. It optimizes a black-box objective function over a well defined parameter space. In some problems, such parameter spaces are defined using function approximation in which feature functions are manually defined. Therefore, the performance of those techniques strongly depends on the...

متن کامل

Uncertainty Handling in Evolutionary Direct Policy Search

Uncertainty arises in reinforcement learning from various sources. Therefore it is necessary to consider statistics based on several roll-outs for evaluating behavioral policies. An adaptive uncertainty handling is added to the CMA-ES, a variable metric evolution strategy proposed for direct policy search. The uncertainty handling dynamically adjusts the number of episodes considered in each ev...

متن کامل

Evolutionary reinforcement learning of artificial neural networks

In this article we describe EANT2, Evolutionary Acquisition of Neural Topologies, Version 2, a method that creates neural networks by evolutionary reinforcement learning. The structure of the networks is developed using mutation operators, starting from a minimal structure. Their parameters are optimised using CMA-ES, Covariance Matrix Adaptation Evolution Strategy, a derandomised variant of ev...

متن کامل

Reinforcement Learning in Repeated Interaction Games

We study long run implications of reinforcement learning when two players repeatedly interact with one another over multiple rounds to play a finite action game. Within each round, the players play the game many successive times with a fixed set of aspirations used to evaluate payoff experiences as successes or failures. The probability weight on successful actions is increased, while failures ...

متن کامل

Self-Organisation of Neural Topologies by Evolutionary Reinforcement Learning

In this article we present EANT, “Evolutionary Acquisition of Neural Topologies”, a method that creates neural networks (NNs) by evolutionary reinforcement learning. The structure of NNs is developed using mutation operators, starting from a minimal structure. Their parameters are optimised using CMA-ES. EANT can create NNs that are very specialised; they achieve a very good performance while b...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Reinforcement Learning for Games: Failures and Successes CMA-ES and TDL in comparision

نویسندگان

چکیده

منابع مشابه

A Covariance Matrix Adaptation Evolution Strategy for Direct Policy Search in Reproducing Kernel Hilbert Space

Uncertainty Handling in Evolutionary Direct Policy Search

Evolutionary reinforcement learning of artificial neural networks

Reinforcement Learning in Repeated Interaction Games

Self-Organisation of Neural Topologies by Evolutionary Reinforcement Learning

عنوان ژورنال:

اشتراک گذاری